Scalability of Hierarchical Meta-learning on Partitioned Data

نویسندگان

  • Philip K. Chan
  • Salvatore J. Stolfo
چکیده

In this paper we study the issue of how to scale machine learning algorithms, that typically are designed to deal with main-memory based datasets, to eeciently learn models from large distributed databases. We have explored an approach called meta-learning that is related to the traditional approaches of data reduction commonly employed in distributed database query processing systems. We explore the scalability of learning arbiter and combiner trees from partitioned data. Arbiter and combiner trees integrate classiiers trained in parallel from small disjoint subsets. Previous work demonstrated the eecacy of these meta-learning architectures in terms of accuracy of the computed meta-classiiers. Here we discuss the computational performance of constructing arbiter and combiner trees in terms of speedup and scalability as a function of database size and number of partitions. The performance of serial learning algorithms is evaluated. We then analyze the performance of the algorithms used to construct combiner and arbiter trees in parallel. Our empirical results validate these analyses and indicate that the techniques can eeectively scale up to large datasets with millions of records using cheap commodity hardware.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

Meta-learning allows an intelligent agent to leverage prior learning episodes as a basis for quickly improving performance on a novel task. Bayesian hierarchical modeling provides a theoretical framework for formalizing meta-learning as inference for a set of parameters that are shared across tasks. Here, we reformulate the model-agnostic meta-learning algorithm (MAML) of Finn et al. (2017) as ...

متن کامل

Scalability of Learning Arbiter and Combiner Trees from Partitioned Data

Much of the research in inductive learning concentrates on problems with relatively small amounts of data residing at one location. In this paper we explore the scalability of learning arbiter and combiner trees from partitioned data. Arbiter and combiner trees integrate classiiers trained in parallel from small disjoint subsets. Previous work demonstrated their eecacy in terms of accuracy, thi...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

An integrated approach for scheduling flexible job-shop using teaching–learning-based optimization method

In this paper, teaching–learning-based optimization (TLBO) is proposed to solve flexible job shop scheduling problem (FJSP) based on the integrated approach with an objective to minimize makespan. An FJSP is an extension of basic job-shop scheduling problem. There are two sub problems in FJSP. They are routing problem and sequencing problem. If both the sub problems are solved simultaneously, t...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997